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Background: Cognitive training, a safe non-pharmacological intervention, may help mitigate cognitive decline 
and prevent the development of dementia in elderly individuals. 

Objective: Evaluate the long-term effects of cognitive training among healthy elderly community members. 

Methods: Healthy individuals 70 years of age or older from one urban community in Shanghai were screened 
and the 151 individuals who met inclusion criteria were assigned either to an intervention group (n=90) or a 
control group (n=61). The intervention involved twice-weekly training in reasoning, memory, and strategy that 
continued for 12 weeks (a total of 24 sessions). Participants were assessed at baseline and at 12 weeks, and 5 
years after enrollment using the Chinese versions of the Neuropsychological Test Battery for Elderly persons 
(NTBE), the Stroop Color-Word Test, and a general health questionnaire. 

Results: Forty-nine (54%) intervention group subjects and 33 (54%) control group subjects completed the 
5-year follow-up. There were few differences in the baseline neurocognitive measures of those who did and 
did not complete the 5-year follow-up, and there were few differences between those who dropped out of 
the intervention group compared to those who dropped out of the control group. At the 5-year follow-up, 
individuals in the intervention group performed better than those in the control group on only 5 measures (in 
the Trails Making A Test and the Cancellation Test 3) of the 61 measures assessed by NTBE and the Stroop tests, 
but none of these differences met the pre-determined required level of statistical significance (p=0.0008). 

Conclusion: We do not confirm the results of previous studies that report long-term benefits of brief cognitive 
training courses for elderly community residents. Our failure to identify differences in cognitive functioning 
five years after cognitive training is not likely due to differential dropout between the intervention and control 
groups but may be related to the relatively small sample and the large number of measures being assessed. 
Future intervention studies for cognitive training in the elderly should be hypothesis driven (i.e., focused on a 
single outcome measure of interest), use much larger samples, and include regular booster sessions as part of 
the cognitive training package. 
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1. Introduction 

Signs of cognitive decline, including memory loss, 
decreased processing speed and difficulty concentrating, 
are commonly seen among elderly people.' 11 Studies 
have found that standardized cognitive training can 
significantly delay cognitive decline and reduce the 
risk of dementia. 12 41 Most cognitive training studies in 
China 151 have focused on single cognitive domains such 
as memory, reasoning or processing speed. However, 
the content of such single-domain training is relatively 
dull and participants' interest and compliance may 
diminish after a few sessions. To address this issue, 



our team developed an integrated multi-domain 
cognitive training package tailored for elderly urban 
community members and administered the three- 
month intervention to a sample of elderly people 
in Shanghai in 2006. Previous reports on the study 
indicated that compared to cognitive functioning in a 
control group, individuals in the intervention group had 
better reasoning, memory and executive functioning 
at the end of the training and that these differences 
persisted for 1 year after the training. 16 " 81 The current 
paper reports on a 5-year follow-up assessment of the 
individuals enrolled in this project. 
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2. Methods 

2.1 Sample 

The enrollment and follow-up of subjects for the study 
is shown in Figure 1. The sample were elderly residents 
of two neighborhoods of one of the 9 sub-districts of 
the Putuo District of Shanghai (one of Shanghai's 19 
districts). A total of 374 elderly community members 
from these neighborhoods were recruited by the 
neighborhood committees (i.e., local administrative 
offices) and screened from April to May 2006. Inclusion 
criteria were: (a) at least 70 years of age; (b) ability to 
self-care with no physical disability or severe physical 
disease; (c) no mental disorders; and (d) ability to read, 
write, see, and hear. A total of 151 elderly individuals 
met these inclusion criteria including 83 males and 68 
females; their age ranged from 70 to 89 years with a 
mean (sd) age of 74.8 (3.7) years. 

Recruitment took place at the offices of the 
neighborhood committee. In order to avoid possible 
contamination due to communication between 
participants in the intervention and control groups, 
assignment to the intervention or control groups was 
done sequentially. The first 50 screened individuals who 
met eligibility criteria were asked to participate in the 
intervention group, the next 50 screened individuals 
who met eligibility criteria were asked to participate in 
the control group, the third group of 50 screened who 
met eligibility criteria were invited to participate in the 
intervention group, and so forth. Using this process, 
90 individuals were recruited in the intervention group 
and 61 in the control group. There were no significant 
differences in gender (X 2 =1.38, p=0.241), age (t=0.35, 
p=0.725), or educational level (X 2 =0.39, df=3, p=0.942) 
between the two groups at baseline. This study was 
approved by the Ethics Committee of Tongji Hospital of 
Tongji University and all participants provided written 
informed consent. Five years after the intervention 
(February to March in 2012), a total of 82 participants 
were followed-up including 49 in the intervention group 
and 33 in the control group. There were no significant 
differences between those who completed the 5-year 
follow from the intervention and control groups in 
gender (X 2 =1.56, p=0.212), age (t=-0.05, p=0.959), or 
educational level (X 2 =2.98, df=3, p=0.395). 

2.2 Assessment tools 

The Chinese version of the WHO Neuropsychological 
Test Battery for Elderly persons (NTBE) was used 
to evaluate eight domains of cognitive functioning: 
auditory verbal learning; sorting; cancellation; 
language; motor functioning; visual function; spatial 
construction; and trail making. [91G1 The test-retest 
correlation coefficients for the auditory verbal learning, 
cancellation, visual function, spatial construction and 
trail making subtests ranged from 0.64-0.92; the split- 
half correlation coefficient was 0.85; and the correlation 
coefficients between domains of NTBE ranged from 
0.10 to 0.42. [nl The Stroop Color-Word Test was used to 
assess executive functioning by testing the accuracy and 
speed of reading words in different colors.' 12131 



Baseline assessments were conducted from June to 
September 2006. This included physical examinations, 
lab tests, NTBE, the Stroop test, and a general health 
questionnaire. 

2.3 Intervention 

Intervention group members received multi-domain 
cognitive training from October 2006 to January 2007. 
Two graduate student psychiatrists provided 24 face-to- 
face training sessions over this 12-week period to the 90 
individuals in the intervention group. The 90 individuals 
were divided into six groups of 15 individuals each for 
the training sessions. The length of each session was 
60 minutes. Participation in the sessions varied from a 
high of 97% (87/90) at the first session to a low of 63% 
(57/90) at the twenty-second session; the mean level 
of participation over the 24 sessions was 76%. Domains 
of training included reasoning (i.e., the identification 
of patterns in a group of words, numbers, or pictures), 
memory (i.e., memorizing pictures and words), problem 
solving (i.e., forming strategies for different tasks), and 
behavioral exercises (i.e., handwriting and handcrafts). 
Each session covered one domain. After each session 
participants provided feedback about the difficulty level, 
perceived usefulness, and interestingness of the session 
(information that was subsequently used to restructure 
the sessions). Between training sessions, participants in 
the intervention group were encouraged to do physical 
exercise and to finish the homework assigned during 
the session (including reading, calligraphy, painting, 
etc.). More details about the training can be found in 
our previous reports on this project. 114 161 Individuals in 
the control group did not receive any cognitive training. 

Three months after enrollment (i.e., at the end of 
the cognitive training in the intervention group) and 
9 months, 15 months and 63 months after enrolment 
all available individuals in the intervention and control 
groups were re-assessed using the same battery 
of instruments used at the baseline assessment. 
These evaluations were conducted by five graduate 
students in psychiatry who were trained in the use of 
the instruments and had good inter-rater reliability. 
The interclass correlation coefficients for the various 
NTBE sub-tests when these five raters simultaneously 
assessed four anxious elderly inpatients were all 
above 0.80. These evaluators were blind to the group 
membership of the individuals they evaluated. 

2.4 Statistical analysis 

Epidata 3.0 software was used for data entry and 
SPSS17.0 software was used for data analyses. 
Descriptive statistics, chi-squared test, one sample 
t-test, Mann-Whitney Z-test (continuous data that is 
not normally distributed), paired t-test and analysis 
of covariance (ANCOVA) were used depending on the 
type of data. Three separate analyses were conducted: 
(a) comparing baseline demographic and neuro- 
psychological test results between the 82 individuals 
who completed the 5-year follow-up with the 69 who 
did not complete the 5-year follow-up; (b) comparing 
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Figure 1. Flowchart of the study 
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neighborhoods in the Putuo District of Shanghai from April to May in 2006 








1 


197 potential participants in intervention group 




150 potential participants in control group 


107 excluded 




89 excluded 


- 37 not located 




- 28 not located 


- 21 refused to participate 




- 24 refused to participate 


- 11 sensory disability 




- 9 sensory disability 


- 9 serious physical illness 




- 6 serious physical illness 


- 29 illiterate 




- 22 illiterate 



T 



90 in intervention group administered baseline 
assessment with Neuropsychological Test 
Battery for Elderly persons (NTBE) and the 
Stroop Color-Word Test 



T 



3-month intervention with 24 sessions of multi- 
dimension cognitive training 



83 repeat cognitive assessment 3 months after 
enrollment (at end of intervention) 

7 lost to follow-up 
- 1 refused 

- 2 moved away 

- 4 serious illness 



I 



77 repeat cognitive assessment 9 months after 


enrollment 




6 lost to follow-up 




- 4 refused 




- 2 serious illness 




1 r 


73 repeat cognitive assessment 15 months 


after enrollment 




4 lost to follow-up 




- 1 refused 




- 2 moved away 




- 1 died 








49 repeat cognitive assessment 63 months 


after enrollment 




24 lost to follow-up 




- 6 refused 




- 9 moved away 




- 6 serious illness 




-3 died 





i 



61 in control group administered baseline 


assessment with Neuropsychological Test Battery 


for Elderly persons (NTBE) and the Stroop Color- 


Word Test 








51 repeat cognitive assessment 3 months after 


enrollment 




10 lost to follow-up 




- 5 refused 




- 3 serious illness 




-2 died 








47 repeat cognitive assessment 9 months after 


enrollment 




4 lost to follow-up 




-2 moved away 




-2 serious illness 





I 



45 repeat cognitive assessment 15 months after 


enrollment 




2 lost to follow-up 




- 1 refused 




- 1 died 






r 


33 repeat cognitive assessment 63 months after 


enrollment 




12 lost to follow-up 




- 4 refused 




- 2 moved away 




-4 serious illness 




-2 died 





Shanghai Archives of Psychiatry, 2014, Vol. 26, No. 1 



• 33 • 



baseline characteristics and neuropsychological test 
results for the 41 intervention group individuals who did 
not complete the 5-year follow-up with the 28 control 
group individuals who did not complete the 5-year 
follow-up; and (c) comparing the characteristics and 
neuropsychological test results at baseline (adjusting 
for age and educational status), and at 3-months post- 
baseline and 63 months post-baseline (adjusting for 
age, educational status and baseline value) between 
the 49 individuals in the intervention group and the 
33 individuals in the control group who completed the 
5-year follow-up assessment. A total of 61 different 
measures (59 measures from the subtests on the NTBE 
and 2 measures from the Stroop Color-Word Test) 
were assessed at each time interval, so to limit possible 
bias due to multiple testing, the p-value for statistical 
significance was set at 0.0008 (i.e., 0.05 / 61). 

3. Results 

3.1 Comparison of baseline characteristics and 

neuropsychological test scores between those who 
did and did not complete the 5-year follow-up 

At the time of the 5-year follow-up it was only possible 
to evaluate 54% (49/90) of the individuals originally 
enrolled in the intervention group and 54% (33/61) of 
the individuals originally enrolled in the control group. In 
the intervention group 12 withdrew consent, 13 moved 
away (typically to live with children), 12 developed 
serious physical illnesses that precluded participation, 
and 4 died. In the control group 10 withdrew consent, 
4 moved away, 9 developed serious physical illnesses 
and 5 died. It is possible that some of these dropouts, 
particularly those that had serious illnesses or died, 
were directly or indirectly related to dementia or 
cognitive decline. This could potentially compromise the 
comparability of the remaining participants. 

To determine whether or not individuals we evaluated 
five years after the intervention were representative 
of all enrolled individuals, we compared the baseline 
demographic characteristic and neuropsychological results 
of the 82 individuals who completed the 5-year evaluation 
and the 69 individuals who did not complete the 5-year 
evaluation. There were no significant differences 
between these two groups by gender (X 2 =1.63, p=0.197), 
age (t=1.32, p=0.192), or educational level (X 2 =2.05, 
df=3, p=0.541). Five of the 61 neuropsychological 
measures were different at baseline between those 
who did and did not complete the 5-year follow-up: 
compared to those who dropped out during the five 
years of follow-up, at baseline those who completed 
the 5-year evaluation had fewer correct responses 
on Cancellation Test 1 (24.23 [2.76] v. 25.04 [1.49], 
t=2.17, p=0.032) and Cancellation Test 2 (10.75 [0.47] v. 
10.89 [0.35], t=2.00, p=0.048), more missing items on 
Cancellation Test 2 (0.26 [0.47] v. 0.11 [0.35], Z=2.41, 
p=0.016), and required more reminders during the 
Trails Making A test (0.56 [0.97] v. 0.29 [0.66], Z=2.29, 
p=0.022) and during the Trails Making B test (1.00 [1.56] 
v. 0.50 [1.26], Z=2.11, p=0.009). Given the large number 
of tests that were compared, none of these differences 



were considered statistically significant. 

We also compared the baseline characteristics of 
the 41 individuals who dropped out of the intervention 
group over the five years with those of the 28 individuals 
who dropped out of the control group over the five 
years. There were no significant differences between 
the groups by gender (X 2 =0.15, p=0.808), age (t=0.57, 
p=0.569), or educational level (X 2 =1.18, df=3, p=0.241). 
Only two of the 61 neuropsychological measures 
were different at baseline: those who dropped out of 
the intervention group had more inserted responses 
in Auditory Verbal Learning Test 1 than those who 
dropped out of the control group (0.49 [0.75] v. 0.11 
[0.32], Z=2.46, p=0.014), but those who dropped out 
of the control group had more inserted responses to 
the Auditory Verbal Learning Test 6 (1.11 [1.23] v. 0.46 
[0.81], Z=2.66, p=0.008). Here, again, given the large 
number of tests considered, neither of these differences 
were considered statistically significant. 

3.2 Comparison of neuropsychological test results at 
baseline, at the end of the 3-month intervention 
and at 5-year follow-up for the 49 intervention- 
group subjects and 33 control-group subjects who 
completed the 5-year follow-up 

The comparison of the baseline, 3-month and 5-year 
neuropsychological test results for the 49 individuals 
from the intervention group and the 33 individuals from 
the control group who completed the 5-year follow-up 
are shown in Table 1. 

Results for 5 of the 61 assessed measures suggest 
that individuals in the intervention group who 
completed the 5-year follow up had somewhat better 
baseline functioning that individuals in the control group 
who completed the 5-year follow-up. After controlling 
for age and educational level, compared to individuals in 
the control group, at baseline those in the intervention 
group had more correct responses on Cancellation 
Test 1 (p=0.006), fewer missing items on Cancellation 
Test 1 (p=0.005) and Cancellation Test 3 (p=0.040), and 
better results on the Contact Function Test (p=0.029) 
and the Semantic Relations Test (p=0.003). But none of 
these differences reached our pre-determined level of 
statistical significance (p=0.0008). 

At the end of the three-month intervention (or 3 
months after enrollment in the control group), after 
controlling for age, educational level and baseline level 
of the measure, 6 of the 61 test scores suggested that 
the cognitive functioning of the intervention group 
was better than that of the control group: compared 
to the control group, individuals in the intervention 
group recalled more items on the Naming Recall Test 
(p=0.020), had a higher score on the Semantic Relations 
Test (p=0.020), performed better on the Visual Matching 
and Reasoning Test (p=0.036), had less duplicated 
responses in the Auditory Verbal Learning Test 7 
(p=0.024), had fewer errors on Trails Making Test A 
(p=0.033), and had less color interference in the Stroop 
Color-Word Test (p=0.017). 
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At the time of the 5-year follow-up, after adjusting 
for age, educational level and the baseline value for 
the measure, intervention group individuals performed 
better than control group individuals on three measures 
of the Trails Making A Test - fewer errors (p=0.041), 
fewer minor errors (p=0.026), and fewer reminders 



(p=0.045). They also performed better on two measures 
of Cancellation Test 3 — more correct responses 
(p=0.015) and fewer missed items (p=0.018). None of 
these differences reached our pre-determined level of 
statistical significance (p=0.0008). 



Table 1. Comparison of scores of 59 neuropsychological measures from the Neuropsychological Test Battery 
for Elderly persons and 2 measures from the Stroop Color-Word Test between the 49 individuals in 
the cognitive training intervention group and 33 individuals in the control group who completed all 
three evaluations at baseline, 3 months post-enrollment and 63 months post-enrollment 











ANCOVA 


Item 

(measure) 


time of 
evaluation 


Intervention 
group 

mean (sd) 


Control 
group 

mean (sd) 


adjusted for age, 
education and (for 3 rd and 
63 r month) baseline result 

F-value (p-value) 


Neuropsychological Test Battery 










for Elderly persons 










Auditory Verbal Learning Test 1 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


4.63 (1.62) 
5.88 (2.00) 
5.98 (2.25) 


4.97 (2.07) 
6.15 (1.89) 
6.64 (2.40) 


1.45 (0.232) 
1.14 (0.289) 

1.46 (0.231) 


Auditory Verbal Learning Test 1 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


0.31(0.77) 
0.43 (0.68) 
0.47 (0.82) 


0.03 (0.17) 
0.48 (0.67) 
0.45 (0.87) 


3.89 (0.052) 
1.64 (0.204) 
0.12 (0.734) 


Auditory Verbal Learning Test 1 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


0.51(1.26) 
0.69 (1.00) 
0.31(0.85) 


0.27 (0.52) 
0.36 (0.60) 
0.30 (0.68) 


1.35 (0.250) 
2.34 (0.130) 
0.22 (0.645) 


Auditory Verbal Learning Test 2 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


7.18 (2.10) 
8.53 (2.50) 
7.61 (2.17) 


6.67(1.87) 
8.18 (2.46) 
8.15 (2.61) 


0.52 (0.474) 
0.05 (0.832) 
2.98 (0.089) 


Auditory Verbal Learning Test 2 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


0.57 (0.82) 
0.82 (1.17) 
0.61 (1.15) 


0.52 (0.76) 
0.94 (1.44) 
0.73 (0.98) 


0.00 (0.949) 
0.40 (0.528) 
0.20 (0.656) 


Auditory Verbal Learning Test 2 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


0.29 (0.82) 
0.47 (0.84) 
0.35 (0.86) 


0.12 (0.42) 
0.27 (0.52) 
0.24 (0.61) 


1.65 (0.202) 
0.75 (0.389) 
0.38 (0.541) 


Auditory Verbal Learning Test 3 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


8.71(2.18) 
9.71(2.49) 
8.63 (2.57) 


8.27 (2.39) 
9.61 (2.42) 
8.70 (2.47) 


0.08 (0.783) 
0.15 (0.698) 
0.50 (0.483) 


Auditory Verbal Learning Test 3 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


1.12 (1.09) 
1.00(1.32) 
1.06(1.65) 


0.82 (1.04) 
1.27(1.70) 
0.61(1.04) 


1.41 (0.239) 
1.26 (0.264) 

1.42 (0.237) 


Auditory Verbal Learning Test 3 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


0.14 (0.35) 
0.43 (0.91) 
0.24 (0.63) 


0.36 (0.65) 
0.21 (0.49) 
0.27 (0.52) 


3.32 (0.072) 
1.62 (0.207) 
0.08 (0.776) 


Auditory Verbal Learning Test 4 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


9.45 (2.32) 
10.47 (2.32) 
9.45 (2.81) 


9.06 (2.37) 
10.27 (2.66) 
9.64 (2.50) 


0.18 (0.674) 
0.08 (0.776) 
1.13 (0.291) 


Auditory Verbal Learning Test 4 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


1.02 (1.41) 
1.16(1.25) 
1.02 (1.71) 


1.00(1.32) 
1.45 (1.73) 
0.82 (1.04) 


0.00 (0.960) 
1.12 (0.293) 
0.64 (0.426) 


Auditory Verbal Learning Test 4 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


0.29 (0.54) 
0.49 (0.82) 
0.18 (0.57) 


0.18 (0.39) 
0.39 (0.70) 
0.33 (0.65) 


1.02 (0.315) 
0.57 (0.453) 
1.61 (0.208) 
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Table 1. Comparison of scores of 59 neuropsychological measures from the Neuropsychological Test Battery 
for Elderly persons and 2 measures from the Stroop Color-Word Test between the 49 individuals in 
the cognitive training intervention group and 33 individuals in the control group who completed all 
three evaluations at baseline, 3 months post-enrollment and 63 months post-enrollment (cont'd) 











ANCOVA 


Item 

(measure) 


time of 
evaluation 


Intervention 
group 

mean (sd) 


Control 
group 

mean (sd) 


adjusted for age, 
education and (for 
3 rd and 63 rd month) 
baseline result 

F-value (p-value) 


Neuropsychological Test Battery 
for Elderly persons 


Auditory Verbal Learning Test 5 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


10.27 (2.53) 
11.12 (2.51) 
10.02 (2.78) 


9.73 (2.35) 
10.24(2.59) 
9.79 (2.51) 


0.44 (0.510) 
0.84 (0.362) 
0.02 (0.879) 


Auditory Verbal Learning Test 5 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


1.57(1.80) 
1.80 (2.20) 
0.96(1.26) 


1.33 (1.34) 
1.42 (1.68) 
0.91 (1.16) 


0.30 (0.586) 
0.35 (0.555) 
0.08 (0.776) 


Auditory Verbal Learning Test 5 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


0.27 (0.61) 
0.55 (0.84) 
0.29 (0.79) 


0.12 (0.33) 
0.36 (0.55) 
0.21 (0.55) 


2.42 (0.124) 
0.73 (0.396) 
0.00 (0.994) 


Sorting Test 
(score) 


baseline 
3 rd month 
63 rd month 


9.45 (0.74) 
9.71 (0.61) 
9.67 (0.66) 


9.58 (0.97) 
9.33 (1.02) 
9.48 (0.91) 


0.77 (0.384) 
3.49 (0.066) 
1.12 (0.294) 


Cancellation Test 1 

(number of correct responses) 


baseline 
3 rd month 
63 rd month 


25.43 (1.29) 
25.33(1.31) 
24.65 (2.92) 


24.45 (1.60) 
24.79 (2.70) 
24.24 (3.67) 


7.94 (0.006) 

0.19 (0.667) 
0.00 (0.985) 


Cancellation Test 1 
(number of missing items) 


baseline 
3 rd month 
63 rd month 


0.55 (1.29) 
0.67(1.31) 
1.31 (2.92) 


1.55 (1.60) 
1.21 (2.70) 
1.55 (3.64) 


8.31 (0.005) 

0.18 (0.671) 
0.04 (0.840) 


Cancellation Test 1 

(number of incorrect responses) 


baseline 
3 rd month 
63 rd month 


0.06 (0.32) 
0.04 (0.29) 
0.08 (0.34) 


0.45 (1.92) 
0.27 (0.91) 
0.30 (0.98) 


2.88 (0.094) 
3.46 (0.067) 
1.83 (0.180) 


Cancellation Test 1 
(completion time in seconds) 


baseline 
3 rd month 
63 rd month 


56.22 (19.34) 
49.78 (15.78) 
56.51 (24.40) 


50.61 (15.34) 
50.24 (14.15) 
54.61 (23.79) 


3.03 (0.086) 
0.34 (0.564) 
0.31 (0.578) 


Cancellation Test 2 

(number of correct responses) 


baseline 
3 rd month 
63 rd month 


10.90 (0.37) 
10.86 (0.41) 
11.86 (7.18) 


10.88 (0.33) 
10.88 (0.33) 
10.55 (1.25) 


0.00 (0.927) 
0.22 (0.644) 
1.33 (0.252) 


Cancellation Test 2 
(number of missing items) 


baseline 
3 rd month 
63 rd month 


0.10 (0.37) 
0.14 (0.41) 
0.16 (0.37) 


0.12 (0.33) 
0.09 (0.29) 
0.45 (1.25) 


0.00 (0.927) 
0.80 (0.373) 
1.92 (0.170) 


Cancellation Test 2 

(number of incorrect responses) 


baseline 
3 rd month 
63 rd month 


0.20(1.02) 
0.14 (0.61) 
0.04 (0.29) 


0.12 (0.55) 
0.00 (0.00) 
0.18 (0.58) 


0.35 (0.558) 
2.22 (0.140) 
2.06 (0.155) 


Cancellation Test 2 
(completion time in seconds) 


baseline 
3 rd month 
63 rd month 


27.35 (12.61) 
28.82 (11.29) 
32.92 (24.58) 


27.39 (11.65) 
26.70 (10.57) 
29.15 (16.21) 


0.02 (0.891) 
1.19 (0.278) 
0.71 (0.402) 
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Table 1. Comparison of scores of 59 neuropsychological measures from the Neuropsychological Test Battery 
for Elderly persons and 2 measures from the Stroop Color-Word Test between the 49 individuals in 
the cognitive training intervention group and 33 individuals in the control group who completed all 
three evaluations at baseline, 3 months post-enrollment and 63 months post-enrollment (cont'd) 











ANCOVA 


Item 

(measure) 


time of 
evaluation 


Intervention 
group 

mean (sd) 


Control 
group 

mean (sd) 


adjusted for age, 
education and (for 
3 rd and 63 rd month) 
baseline result 

F-value (p-value) 


Neuropsychological Test Battery 
for Elderly persons 


Cancellation Test 3 

(number of correct responses) 


baseline 
3 rd month 
63 rd month 


17.24 (2.18) 
17.45 (1.65) 
17.35 (2.11) 


15.94 (3.76) 
16.70 (2.07) 
15.42 (3.15) 


2.74 (0.102) 
1.12 (0.294) 
6.24 (0.015) 


Cancellation Test 3 
(number of missing items) 


baseline 
3 rd month 
63 rd month 


1.55 (2.04) 
1.37 (1.55) 
1.51 (1.89) 


3.09 (3.74) 
2.30 (2.07) 
3.33 (3.11) 


4.36 (0.040) 

2.06 (0.156) 
5.83 (0.018) 


Cancellation Test 3 

(number of incorrect responses) 


baseline 
3 rd month 
63 rd month 


0.16 (0.83) 
0.12 (0.63) 
0.12 (0.60) 


0.48 (1.72) 
0.21(1.05) 
0.27 (0.91) 


0.73 (0.397) 
0.34 (0.562) 
0.60 (0.441) 


Cancellation Test 3 
(completion time in seconds) 


baseline 
3 rd month 
63 rd month 


44.51 (14.71) 
44.65 (16.29) 
51.55 (34.45) 


40.88 (17.00) 
41.94(10.19) 
43.21 (19.55) 


1.27 (0.263) 
0.27 (0.607) 
0.82 (0.368) 


Articulation Test 
(score) 


baseline 
3 rd month 


11.71 (0.74) 
11.67 (0.94) 


11.76 (0.66) 
11.82 (0.64) 


0.17 (0.681) 
1.25 (0.266) 


63 rd month 


11.80 (0.82) 


11.73 (0.88) 


0.17 (0.678) 


Naming Test 

(number of items named) 


baseline 
3 rd month 
63 rd month 


7.00 (0.00) 
6.98 (0.14) 
7.00 (0.00) 


6.97 (0.17) 
7.00 (0.00) 
7.00 (0.00) 


1.81 (0.183) 
0.28 (0.597) 


Naming Recall Test 
(number of items recalled) 


baseline 
3 rd month 
63 month 


5.65 (1.25) 
6.02 (0.83) 
4.76 (1.18) 


5.91 (1.13) 
5.52 (0.97) 
5.00 (1.23) 


1.54 (0.219) 
5.65 (0.020) 

1.88 (0.174) 


Verbal Fluency Test 
(number of animals named) 


baseline 
3 rd month 
63 rd month 


1 C C1 I A O A\ 

lb.bl (4.o4) 
16.41 (4.59) 
14.73 (4.72) 


ic ad I'D ncl 
lb. 03 (o.iJb) 

16.18 (4.42) 

14.09 (4.34) 


1.64 (0.2Ub) 
1.93 (0.169) 

0.55 (0.463) 


Verbal Fluency Test 

(number of surnames mentioned) 


baseline 
3 rd month 
63 rd month 


17.78 (7.14) 
18.18 (6.09) 
18.02 (7.01) 


16.18 (5.87) 
16.48 (4.95) 
16.97 (6.13) 


0.50 (0.483) 
0.33 (0.568) 
0.15 (0.697) 


Verbal Fluency Test 

(number of vegetables named) 


baseline 
3 rd month 
63 rd month 


14.18 (4.18) 
14.49 (4.22) 
13.82 (4.28) 


14.73 (2.79) 
15.58 (3.90) 
15.06 (4.46) 


0.34 (0.546) 
1.03 (0.314) 
1.84 (0.178) 


Mini-Token Test 
(score) 


baseline 
3 rd month 


2.49 (0.94) 
2.65 (0.97) 


2.21(1.08) 
2.48 (0.94) 


0.36 (0.550) 
0.02 (0.890) 


63 rd month 


2.51 (0.96) 


2.06 (1.00) 


2.43 (0.123) 


Motor Test 
(score) 


baseline 


23.90 (4.70) 


24.30(3.26) 


0.43 (0.513) 


3 rd month 


25.94(2.15) 


24.73 (5.27) 


2.64 (0.108) 


63 rd month 


26.22 (3.22) 


26.15 (2.15) 


0.15 (0.699) 


Contact Function Test 
(score) 


baseline 
3 rd month 


3.51 (0.68) 
3.51 (0.65) 


3.06 (0.83) 
3.24 (0.83) 


4.92 (0.029) 

0.24 (0.628) 


63 rd month 


3.69 (0.51) 


3.45 (0.62) 


0.18 (0.675) 
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Table 1. Comparison of scores of 59 neuropsychological measures from the Neuropsychological Test Battery 
for Elderly persons and 2 measures from the Stroop Color-Word Test between the 49 individuals in 
the cognitive training intervention group and 33 individuals in the control group who completed all 
three evaluations at baseline, 3 months post-enrollment and 63 months post-enrollment (cont'd) 











ANCOVA 


Item 

(measure) 


time of 
evaluation 


Intervention 
group 

mean (sd) 


Control 
group 

mean (sd) 


adjusted for age, 
education and (for 
3 rd and 63 rd month) 
baseline result 

F-value (p-value) 


Neuropsychological Test Battery 
for Elderly persons 


Semantic Relations Test 
(score) 


baseline 
3 rd month 
63 rd month 


3.33 (0.99) 
3.37 (0.70) 
3.55 (0.65) 


2.55 (1.25) 
2.76 (0.97) 
3.27 (0.88) 


9.14 (0.003) 
5.67 (0.020) 

0.90 (0.345) 


Recognition 

(number correctly identified items) 


baseline 
3 rd month 
63 rd month 


6.24(1.27) 
6.16(1.56) 
5.57 (1.28) 


5.88(1.29) 
6.00(1.32) 
5.76(1.15) 


1.49 (0.227) 
0.00 (0.991) 
0.27 (0.602) 


Recognition 

(number incorrect items identified) 


baseline 
3 rd month 
63 rd month 


1.18 (0.91) 
1.18(1.07) 
1.37 (1.24) 


1.21 (0.96) 
1.09 (0.77) 
1.52 (1.23) 


0.00 (0.997) 
0.15 (0.702) 
0.37 (0.547) 


Visual Matching and Reasoning 
(score) 


baseline 
3 rd month 
63 rd month 


5.27 (2.10) 
6.18(1.87) 
5.69 (2.07) 


4.48 (1.50) 
4.82 (2.14) 
4.52 (1.79) 


1.97 (0.165) 
4.56 (0.036) 

3.21(0.077) 


Auditory Verbal Learning Test 6 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


8.94 (3.09) 
10.22 (2.93) 
9.06 (3.78) 


7.88 (3.25) 
9.21 (3.90) 
8.39 (3.98) 


1.62 (0.208) 
0.16 (0.688) 
0.01 (0.913) 


Auditory Verbal Learning Test 6 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


0.92 (1.30) 
1.43 (1.88) 
1.02 (1.44) 


0.58(1.00) 
1.09 (1.59) 
0.91 (1.38) 


1.41 (0.238) 
0.16 (0.692) 
0.08 (0.784) 


Auditory Verb Learning Test 6 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


1.20(1.58) 
1.04 (0.98) 
0.78(1.20) 


0.94(1.06) 
0.88 (1.02) 
0.48 (0.67) 


1.53 (0.220) 
0.20 (0.658) 
0.82 (0.369) 


Auditory Verbal Learning Test 7 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


3.80 (1.77) 
3.98(1.71) 
3.33 (1.55) 


3.61 (1.37) 
3.79(1.93) 
2.94(1.12) 


0.12 (0.727) 
0.03 (0.857) 
1.32 (0.255) 


Auditory Verbal Learning Test 7 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


0.39 (0.79) 
0.24 (0.55) 
0.24 (0.60) 


0.30 (0.59) 
0.55 (0.97) 
0.18 (0.47) 


0.10 (0.757) 
5.30 (0.024) 

0.46 (0.501) 


Auditory Verbal Learning Test 7 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


0.84(1.21) 
0.84 (1.14) 
0.88(1.25) 


1.00(1.20) 
0.82 (0.92) 
0.76(1.17) 


0.24 (0.624) 
0.27 (0.607) 
0.66 (0.418) 


Auditory Verbal Learning Test 8 
(number of correct responses) 


baseline 
3 rd month 
63 rd month 


7.78 (3.31) 
9.39 (3.70) 
8.31 (3.56) 


7.33 (3.31) 
8.76 (3.89) 
8.30 (3.75) 


0.04 (0.839) 
0.03 (0.858) 
0.29 (0.590) 


Auditory Verbal Learning Test 8 
(number of duplicated responses) 


baseline 
3 rd month 
63 rd month 


0.69 (1.18) 
0.71 (1.32) 
0.51 (1.21) 


0.48 (0.97) 
0.48 (0.83) 
0.76(1.17) 


0.46 (0.501) 
0.61 (0.437) 
0.69 (0.408) 


Auditory Verbal Learning Test 8 
(number of inserted responses) 


baseline 
3 rd month 
63 rd month 


1.24(1.69) 
0.98(1.01) 
0.84(1.43) 


0.79 (0.82) 
0.94 (1.03) 
0.48 (0.76) 


2.90 (0.092) 
0.00 (0.949) 
0.75 (0.391) 
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Table 1. Comparison of scores of 59 neuropsychological measures from the Neuropsychological Test Battery 
for Elderly persons and 2 measures from the Stroop Color-Word Test between the 49 individuals in 
the cognitive training intervention group and 33 individuals in the control group who completed all 
three evaluations at baseline, 3 months post-enrollment and 63 months post-enrollment (cont'd) 










ANCOVA 


Item 

(measure) 


time of 
evaluation 


Intervention 
group 

mean (sd) 


Control 
group 

mean (sd) 


adjusted for age, 
education and (for 
3 rd and 63 rd month) 
baseline result 

F-value (p-value) 


Neuropsychological Test Battery 
for Elderly persons 


Spatial Construction Test 
(score) 


baseline 
3 rd month 
63 rd month 


5.45 (1.12) 
5.47 (0.83) 
5.61 (0.76) 


5.27 (1.10) 
5.03 (1.38) 
5.30 (1.13) 


0.17 (0.678) 
1.71 (0.194) 
0.97 (0.328) 


Trails Making Test A 
(completion time in seconds) 


baseline 

3 rd mnnth 

63 rd month 


114.49 (58.32) 
1 m fi^ IA"\ iq\ 

1U1.UJ \h±.Dj) 

107.84 (54.40) 


123.39 (47.90) 
120.12 (56.80) 


0.05 (0.828) 
n An in ^ic\\ 

0.24(0.627) 


Trails Making Test A 
(number of errors) 


baseline 

3 rd mnnth 
j iriUllLll 

63 rd month 


0.47 (1.17) 
0.08 (0.28) 


0.52 (0.91) 

1 1 9 l~) 1 ~)\ 
1.1Z 

0.36 (0.82) 


0. 08 (0.784) 
a 73 in naal 

1. / j \U.Ujj) 

4.32 (0.041) 


Trails Making Test A 

(number of times needed reminding) 


baseline 

3 rd mnnth 
j rilUllLll 

63 rd month 


0.31 (0.68) 

U,jl \\J.\J£. j 

0.19 (0.67) 


0.27 (0.63) 
0.61 (1.22) 


0.24 (0.627) 

U.UZ \U.oo±) 

4.16 (0.045) 


Trails Making Test A 
(number of minor errors) 


baseline 
3 rd month 
63 rd month 


0.16 (0.43) 
0.16 (0.37) 
0.04 (0.20) 


0.14 (0.50) 
0.15 (0.36) 
0.30 (0.68) 


0.38 (0.541) 
0.05 (0.818) 
5.14 (0.026) 


Trails Making Test B 
(completion time in seconds) 


baseline 
3 rd month 
63 rd month 


207.02 (98.69) 
186.51 (83.60) 
213.49 (146.72) 


221.79 (79.56) 
204.79 (82.90) 
233.88 (109.82) 


0.02 (0.891) 
0.18 (0.670) 
0.02 (0.888) 


Trails Making Test B 
(number of errors) 


baseline 
3 rd month 
63 rd month 


0.82 (1.56) 
0.37 (0.86) 
0.33 (1.00) 


0.67 (1.53) 
0.82 (1.21) 
0.79 (1.80) 


0.64 (0.425) 
3.10 (0.082) 
1.31 (0.257) 


Trails Making Test B 

(number of times needed reminding) 


baseline 
3 rd month 
63 rd month 


0.57 (1.51) 
0.35 (0.75) 
0.77 (1.33) 


0.39 (0.75) 
0.73 (1.07) 
1.18 (1.29) 


0.92 (0.341) 
3.43 (0.068) 
1.74 (0.192) 


Trails Making Test B 
(number of minor errors) 


baseline 
3 rd month 
63 rd month 


0.22 (0.55) 
0.20 (0.41) 
0.27 (0.61) 


0.18 (0.47) 
0.21 (0.42) 
0.36 (0.82) 


0.68 (0.412) 
0.03 (0.869) 
0.07 (0.788) 


STROOP 

COLOR-WORD TEST 


Color Interference 
(score) 


baseline 
3 rd month 


47.00(26.77) 
39.55 (14.71) 


47.39 (21.04) 
45.03 (24.68) 


1.41 (0.239) 
5.94 (0.017) 


63 rd month 


49.52 (20.71) 


45.79 (35.65) 


1.96 (0.166) 


Word Interference 
(score) 


baseline 
3 rd month 
63 rd month 


18.08 (13.55) 
20.22 (13.35) 
25.58 (17.75) 


15.09 (15.69) 
23.91 (14.89) 
29.00 (21.12) 


0.04 (0.840) 
0.89 (0.349) 
3.74 (0.057) 
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4. Discussion 

4.1 Main findings 

Among individuals who were followed up five years 
after enrollment, after adjusting for baseline cognitive 
functioning, age and education there were no 
differences on a wide range of neurocognitive tests 
between those who had received a three-month 
cognitive training program and those who had not 
received the training. These results are based on 54% 
follow-up of the original sample so the failure to find a 
difference could be due to differential dropout from the 
two groups. But the proportion of dropouts was identical 
in the two groups (46%) and we found few differences 
in the baseline characteristics and neuropsychological 
profile of those who completed the study versus those 
who dropped out. Moreover, there were also few 
differences between those who dropped out from the 
intervention group versus those who dropped out of the 
control group. These results strongly suggest that the 
lack of differences in neuropsychological functioning five 
years after the three-month cognitive training course is 
real; it is not likely due to differential dropout rates in 
the two groups. 

Thus our results do not confirm results from 
other countries that report long-term effectiveness 
of cognitive training. For example, Ball and colleagues 
randomly assigned 2832 community members aged 65 
to 94 years into a memory training group, a reasoning 
training group, a processing speed training group and a 
control group; after 6 weeks of cognitive training, scores 
on corresponding cognitive functioning tests were 
improved and participants' cognitive functioning and 
daily functioning were significantly better than those of 
the control group five years after the intervention.' 17 " 191 

Why were we unable to replicate Ball's findings? 
The sample size of Ball's study was 18-fold larger 
than the sample size in our study so his study had the 
statistical power to identify small differences that we 
could not identify. Ball's study had younger participants 
(starting at 65 years of age while our study started at 70 
years of age) so it's possible that a smaller proportion 
of the subjects in his study were affected by age-related 
decline (i.e., mild cognitive impairment) that would 
swamp the positive effects of a short cognitive training 
program. Most importantly, 60% of the participants 
in Ball's study were given reinforcement training 11 
months and 35 months after the initial cognitive 
training, limiting the attenuation of the training effect 
over time; we did not conduct any booster sessions 
over the five-year follow-up period. The younger age of 
participants and use of booster training sessions in Ball's 
study were probably also a factor in the higher 5-year 
follow-up rate in his study compared to ours (67% v. 
54%). 

4.2 Limitations 

The main limitation in this study is the relatively small 
sample size, which was magnified by the high dropout 
rate (46%) at the time of the 5-year follow-up. Given 



the comprehensive battery of neuropsychological 
tests conducted (with 61 independent measures) an 
initial sample size of 151 individuals and a follow-up 
sample size of 82 individuals is much too small. The 
requirement to adjust the results for age, educational 
status and baseline values further weakened the power 
of the tests to identify differences between groups or 
over time. Thus many of the negative results in the 
study could be due to Type II errors - that is, failure to 
identify important differences between groups because 
the study sample was too small. Moreover, several 
of the indices used in the neuropsychological battery 
employed are not normally distributed, so it would be 
preferable to use non-parametric tests (e.g., Mann- 
Whitney tests) to compare the results across groups. 

4.3 Implications 

The belief that a short cognitive training program can 
have a prolonged effect on the cognitive functioning 
of elderly individuals is attractive, but probably not 
realistic. Changing the long-term trajectory of cognitive 
functioning, particularly in the elderly cohort who 
are experiencing a natural decline in their mental 
functioning, will probably require sustained and 
repetitive effort to encourage elderly individuals to 
adopt a 'cognitively healthy lifestyle' just like earlier 
public health efforts focused on getting middle-age and 
elderly adults to adopt a 'heart-healthy' lifestyle. 

Given the advanced age of respondents in these 
studies there is inevitably going to be a high dropout 
rate as the follow-up period is extended. Most of these 
dropouts are not preventable; many respondents 
develop serious illnesses that prevent participation, 
some died and some move away (often to live with their 
children). Only about one-third (22/68) of those who 
dropped out in our study withdrew consent, so even if 
it is possible to make the programs so engaging that all 
participants are willing to continue participation, there 
will still be relatively high dropout rates. Sample sizes 
for such studies need to take this into consideration 
and the analysis of long-term outcomes must assess the 
possibility that there is differential rates and types of 
dropouts in the intervention and control groups and, if 
so, adjust the results accordingly. 

When using comprehensive neuropsychological 
batteries with dozens of measures to compare groups 
or to compare a single group over time the likelihood of 
identifying statistically significant differences is greatly 
increased due to the number of statistical tests being 
conducted, particularly if the sample is large. In this 
scenario the 'significant' tests or measures will change 
every time the intervention is repeated and researchers 
may exhaustively debate the reasons for the differences 
in their studies without realizing that many of these 
results are statistical artifacts. To prevent this from 
happening, intervention research for cognitive training 
must move towards hypothesis-based testing in which 
the effectiveness of the outcome is based on a small 
number of specific measures identified before starting 
the intervention. The fishing expeditions for 'significant 
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variables' that many researchers currently undertake 
using the huge neuropsychological batteries currently 
available will not advance knowledge in the field. 
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Erratum 

In the December 2014 article 'Characteristics of the gastrointestinal microbiome in children with autism spectrum 
disorder: a systematic review' by Xinyi Cao, Ping Lin, Ping Jiang, and Chunbo Li (Shanghai Archives of Psychiatry. 2013; 
25(6): 342-353. doi: http://dx.doi.Org/10.3969/i.issn.1002-0829.2013.06.003 ), there were five errors in figure 1: (a) 
the number of articles identified from English-language databases should have been 5962 instead of 5961; (b) the 
time range searched in the ISI web of knowledge should have been 1994-2013, not 1986-2013; (c) the time range 
searched in Ovid/Medline should have been 1970-2013, not 1946-2013; (d) the time range searched in PsyclNFO 
should have been 1966-2013, not 1806-2013; (e) the time range searched in Cochrane Library should have been 1967- 
2013, not all years. These changes were made in the online version of the journal on January 27, 2014. 



